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Abstract. Recent attempts to constrain cosmological variation in the fine structure con- 
stant, a, using quasar absorption lines have yielded two statistical samples which initially 
appear to be inconsistent. One of these samples was subsequently demonstrated to not pass 
consistency tests; it appears that the optimisation algorithm used to fit the model to the 
spectra failed. Nevertheless, the results of the other hinge on the robustness of the spectral 
fitting program VPFIT, which has been tested through simulation but not through direct 
exploration of the likelihood function. We present the application of Markov Chain Monte 
Carlo (MCMC) methods to this problem, and demonstrate that VPFIT produces similar val- 
ues and uncertainties for Aa/a, the fractional change in the fine structure constant, as our 
MCMC algorithm, and thus that VPFIT is reliable. 
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ticular, certain absorption lines in these gas 
clouds are sensitive to changes in one or more 
of the fundamental constants of nature. For an 
atom/ion, the relativistic corrections to the en- 
ergy levels of an electron are proportional to 
a 2 , although the magnitude of the change de- 
pends on the transition under consideration. By 
comparing the wavelengths of absorption lines 
with differing sensitivities to a change in a, we 
are able to place constraints on a change in a 
between the early universe and today. That is, 
we are able to measure Aa/a = (a z - ao)/ao< 
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1. Introduction 

Recent years have seen sustained interested 
in attempting to determine whether any of 
the fundamental constants of nature vary. 
Efforts have been focused in particular on the 
fine structure constant, a, and the proton-to- 
electron mass ratio, /j, although other dimen- 
sionless constants have also been considered. 
Quasars provide a method of probing the value 
of a in the early universe by illuminating gas 
clouds along the line of sight to Earth. In par- 
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where a, is the value of a at redshift z and ao 
is the laboratory value. 

The ten tative detection of a variation in a 
reported by Webb etafl d 19991) has further in- 
creased interest in this field. Subsequent efforts 
refined this work by increasing the number of 
absorption systems considered. This yielded 
a con straint of Aa/ a = (-0.57 + 0.11) x 
10~ 5 (Murphy et al.l l2004 from 143 absorp- 
tion systems. However all of these observations 
are from the Keck/HIRES (High Resolution 
Echelle Spectrometer) instrument, and so it re- 
mains important to confirm such unexpected 
results with independent equipment. 

IChand et al] d2004 reported Aa/a = 
(-0.06 ± 0.06) x 10~ 5 from 23 measure- 
ments using the Ultraviolet and Visual Echelle 
Spectrograph (UVES), on the Very Large 
Telescope (VLT). These two measurements of 
Aa/ a are clearly discr epant. A later analy- 
sis dMurphv et al.l 120 08) found that the anal- 
ysis of IChand etaTI cannot be correct as it 
states a statistical precision in excess of the 
maximum theor etical precisio n allowed by the 
data. Similarly, Murphy et al.l ana lysed the y 2 
vs Aa/a curves produced by the IChand et al.1 
optimisation algorithm, and concluded that the 
shape of the curves demonstrate a failure of 
the algorithm to find the maximum likelihood 
value of Aa/a implied by the model fits and 
da ta, and thus t hat the estimate of Aa/a given 
by lChand et al.l is unreliable. 

Although it would appear that the 
Murp hyet al.1 results are robust, it is worth di- 
rectly investigating the optimisation algorithm 
used in order to confirm that it is reliable. 
Furthermore, each measurement of Aa/a 
will be subject to systematic errors, but some 
systematic errors should have an expectation 
value of zero, and thus averaging over many 
absorption systems will in principle eliminate 
such errors. It has become commonplace to 
quote values of Aa/a for individual systems; 
in these cases, one has no method of deter- 
mining the size of certain systematic errors 
through comparison with other systems, and 
so one should be particularly careful that 
the statistical errors are stated correctly. The 
MCMC method we describe herein allows one 



to confirm the validity of the statistical errors 
produced by a different method of analysis. 

2. Motivation for MCMC 

Ideally one would like an independent method 
of demonstrating whether or not a purported 
value of Aa/a and the associated statistical un- 
certainty given by an optimisation algorithm 
are reliable. 

Optimisation algorithms seek to minimise 
X 1 = 2/[^( x )i _ dj] 2 /<T 2 , where d; are the spec- 
troscopic data points, /(x,) is the prediction of 
the model at each data point, and <t, is the 
standard error associated with each spectro- 
scopic data point. However, x 2 — - 2 ln(L(x)) 
up to an additive constant which can be ne- 
glected, as we only ever consider differences 
of x 2 for finding model parameters. We thus 
use the definition of the likelihood function 
L(x) = exp[-^ 2 (x)/2]. 

One option is to use an alternate optimi- 
sation algorithm. Although optimisation algo- 
rithms are in principle simple, numerical is- 
sues can cause them to fail inconsistently; this 
m ay be the case for the algorithm utilised 
by lChandetail (l2004 . In particul ar, the opti- 



misation algorithms employed by Chand et al. 
d2004|) . lMurphv et alJd2008l) and lMurphv et al. 
(2004) are of the Newton type, which require 
all first and second partial derivatives of x 2 
with respect to to the parameters to be known. 
The Voigt function used to model the ab- 
sorption lines is not analytic, and nor are its 
derivatives. As such, partial derivatives must 
be approximated by finite difference meth- 
ods. Inappropriate choices of the step size for 
the finite differencing scheme can either pro- 
duce poor approximations to the derivatives 
(step size too large) or be rendered useless by 
roundoff error (step size too small), leading 
to poor performance of the optimisation algo- 
rithm. There are number of other numerical is- 
sues which may cause failure of the optimisa- 
tion algorithm, but we do not consider these 
here. 

On account of these numerical issues, one 
would desire to explore the parameter space it- 
self to directly determine the confidence limits 
on Aa/a. 
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3. Description of the MCMC method 

Traditional Monte Carlo methods suffer from 
the "curse of dimensionality". That is, their 
performance degrades exponentially with in- 
creasing dimensionality. The Markov Chain 
Monte Carlo (MCMC) method degrades only 
polynomially with increased dimensionality, at 
the expense of introducing correlation between 
samples. Additionally, MCMC methods must 
be tuned to the probability distribution under 
consideration so as to explore the parameter 
space efficiently. 

We im plement a variant of the Metropolis 
algorithm (Metropo lis et alj Il953l) to explore 
our parameter space. The Metropolis algo- 
rithm proposes a new position in the param- 
eter space, x', based on the current posi- 
tion, x, according to some proposal function, 
7\x,x')- The only requirement imposed is that 
T(x, x') = 7\x', x) (i.e the proposal distribution 
is symmetric). 

Although in principle there are large num- 
bers of possible proposal funcitons, T, in prac- 
tice the most common choice is a multidimen- 
sional Gaussian centred on the current point, 
such that x' = x + gN(Q, S) where S is the co- 
variance matrix obtained from the optimisation 
algorithm at the purported best fit solution, and 
g is a scalar tuning factor. The choice of T in- 
fluences only the efficiency of the algorithm, 
not the formal correctness of the solution. The 
initial H may or may not be a good approxima- 
tion to the true convariance matrix. The use of 
£ ensures that the distribution of proposed pa- 
rameters is approximately the same as the un- 
derlying distribution; the closer £ is to the true 
covariance matrix, the faster the MCMC algo- 
rithm will be. 

The tuning factor g effectively controls the 
size of steps taken. If g is too large, most trial 
steps will land in regions of low likelihood, 
and therefore most steps will be rejected (the 
chain will not move). On the other hand, if g is 
too small, the acceptance rate will be ss 100%, 
but the parameter space will be explored too 
slowly. If both the target and proposal distri- 
butions are Gau ssian then the idea l acceptance 
rate is * 44% dGelman et al.ll 19951) . 



The algorithm generates a sequence of 
points, {x'}, according to a two step prescrip- 
tion. First, from the current point, x, propose a 
new point, x', via T(x,x'). Then calculate the 
ratio q = L{x')/L{x). Secondly, with proba- 
bility min(<7, 1) move to the new point i.e. set 
x = x'. Otherwise, retain the current point 
i.e. x f+1 = x'. In this fashion, proposed moves 
to a point which is more likely than the existing 
point are always accepted, whereas proposed 
moves to a point which is less likely than the 
existing point are sometimes accepted, depend- 
ing on the ratio of likelihoods. For a sufficiently 
large numbers of iterations, and with proper 
tuning of the algorithm, the distribution of {x f } 
will sample from the underlying probability 
distribution, up to a normalisation constant. In 
particular, {x' Aa ,} will sample from the proba- 
bility distribution of Aa/a, from which we can 
obtain a best estimate and confidence limits. 

To minimise running time, for each model 
fit we run our MCMC algorithm several times 
(usually five to ten, depending on the com- 
plexity of the situation, with several hundred 
thousand iterations per stage) and re-estimate 
£ at each stage from the chain. Prior to starting 
each MCMC run, we execute small runs (typ- 
ically 250 iterations) in which we tune g to be 
such that the acceptance rate is between 30% 
and 50% (the outputs from these small runs 
do not count towards each stage). Thus, even 
if the initial covariance matrix does not allow a 
good exploration of the parameter space, by re- 
estimating ~L and retuning g several times we 
can drastically increase the chance that the fi- 
nal MCMC run will produce a good approxi- 
mation of the underlying probability distribu- 
tion. We determine whether the final MCMC 
run is useful by examining the chain for auto- 
correlation. If the autocorrelation length in the 
chain is much smaller than the chain length, 
then we deem the final run acceptable. 

We do not require the usual "burn-in" pe- 
riod, where one discards a certain number of 
samples from the start of the chain, because we 
believe our parameters already start at the like- 
lihood maximum. We can determine whether 
this assumption is robust by examining the 
chain - parameters should stay near their start- 
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ing values, on average, if our initial parameter 
estimates were good. 

We imp lement the Mul tiple Try Metropolis 
algorithm dLiu et al.ll2000l) . which expands the 
Metropolis algorithm to allow multiple at- 
tempts at each step. If the initial proposal dis- 
tribution is poorly tuned, this variant of the 
Metropolis algorithm tends to be much more 
robust for larger number of dimensions. 

Similarly, we do not use a Gaussian pro- 
posal distribution, but start with a radial distri- 
bution which has P(r) oc (2/3)r 2 exp(-r 2 /2) + 
(1/3) exp(-r). This mixture of an exponential 
distribution and the radial component of a 2D 
Gaussian allows the algorithm to occasionally 
take large steps, whilst otherwise taking steps 
clustered about some value; this speeds explo- 
ration of the parameter space where X is ini- 
tially poorly tuned. For our proposal distribu- 
tion, T, we generate our parameters from a 
spherically symmetric distribution with radial 
probability density P(r) and then left multiply 
by L (where LL T = E) so that the proposal dis- 
tribution has the correct covariance structure. 

MCMC can be used to directly estimate 
posterior probabilities in the Bayesian frame- 
work with the appropriate choice of prior dis- 
tribution. The likelihood ratio then becomes 
L(x) — > L(x)7r(x), where 7r(x) is the Bayesian 
prior for a particular set of parameters. We 
utilise improper flat priors for the column 
densities and redshifts of each component. 
Similarly, we utilise a flat prior on Aa/a. 

We utilise a flat prior for the logarithm 
of the Doppler parameters, rather than the 
Doppler parameters, to suppress movements 
to small b. Otherwise, the algorithm tends to 
propose many jumps to b < for narrow 
lines, which must be rejected - this substan- 
tially reduces the efficiency of the algorithm. 
This is somewhat reasonable also on physical 
grounds, as we do not expect large numbers of 
gas clouds described by arbitrarily small tem- 
peratures, and certainly our fitting procedure 
would reject a profile decomposition of this 
nature. Using a flat prior for the Doppler pa- 
rameters results in unacceptably large running 
times, and so we use the logarithmic prior as 
an easy and practical option. 



We have modified the spectral profile 
fitting program VPFIlQ to incorporate our 
MCMC algorithm. The outputs of the optimi- 
sation algorithm are fed directly into the initial- 
isation for the MCMC code. Our MCMC algo- 
rithm uses the same code as VPFIT to generate 
the Voigt profiles and calculate^ 2 , and thus our 
algorithm does not eliminate the possibilty of 
a code bug entirely. However, with this caveat, 
our algorithm can determine whether the opti- 
misation code used by VPFIT does or does not 
converge to the desired solution and produce 
appropriate uncertainty estimates. 

4. Results 

We have applied our MCMC algorithm as de- 
scribed above to the three quasar absorption 
systems described below. The numerical val- 
ues produced by the optimisation algorithm 
("VPFIT") and the MCMC code ("MCMC") 
are given in table Q] In all cases we find 
good agreement between the VPFIT result and 
that produced by our MCMC code, although 
the statistical uncertainties produced by our 
MCMC code are mildly smaller than that pro- 
duced by VPFIT, indicating that VPFIT may 
be conservative. 

Our fits all pass appropriate robustness 
tests (in particular, xl ~ 1 where v is the num- 
ber of degrees of freedom for the fit). All of our 
final chains mix well, although with the sec- 
ond and third objects considered here the initial 
chains do not - re-estimation of the covariance 
matrix multiple times is necessary to achieve a 
well mixed chain. All redshifts here refer to the 
redshift of the absorption system. 

4.1. LBQS 2206-1 958 z= 1.018 

This absorption system appears to be well 
fitted by a single Voigt profile. We use the 
Mgn AA2796, 2803 A transitions, which are 
relatively insensitive to a variation, and the 
Fe n /U/U2382, 2600, 2344, 2587A transitions, 
which are strongly sensitive. 

The parameters are approximately jointly 
normally distributed. We expect this for single 

1 See http://www.ast.cam.ac.uk/~rfc/vpfit.html 
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Table 1. Comparison of purported values of Aa/a calculated by VPFIT, and the results of the 
MCMC algorithm. Quoted uncertainties are lcr. 



Object 


Redshift 


Aa/a - VPFIT 


Aa/a - 


MCMC 


LBQS 2206-1958 
LBQS 0013-0029 
Q0551-366 


1.018 
2.029 
1.748 


(-0.51 + 1.07) x 10" 5 
(-0.86 ± 0.94) x 10 5 
(-0.80 + 1.08) x 10" 5 


(-0.51 
(-0.83 
(-0.89 


± 0.88) x 10" 5 
± 0.77) x 10" 5 
± 0.84) x 10~ 5 



11.5 12 12.5 13 

log 10 (N(2)) 

Histogram of the 



Fig.l. 

log 10 (jV(2)/cm 2 ), 



chain for 
where N(2) is the col- 
umn density of the central component to the 
fit for the z = 1 .748 absorption system toward 
Q 055 1-366. 



component fits - the Voigt profile decomposi- 
tion is effectively unique with one component. 



4.2. LBQS0013-0029z = 2.029 

This system appears with two obvious ab- 
sorption features. We find that the bluer 
feature is better fitted by two components 
than it is by one on the basis of a sta- 
tistically significant reduction in x 1 when 
using two components. That is, we fit 
three components to this absorption pro- 
file. We use a wide variety of transitions, 
namely: Sin ^1526A, Aim .LU854, 1862A, 
Fe n AAAAA23Z2, 2600, 2344, 2587, 1608A and 
Mgi/12852A. 

The chain values of Aa/a are approxi- 
mately Gaussian-distributed. 



-4x10 



-2x10 



2x10 



Aot/ a 



Fig. 2. Histogram of the chain for Aa/a for 
the z = 1.748 absorption system toward 
Q 055 1-366. 

4.3. Q0551-366z,= 1.748 

This absorption feature appears as one weak 
feature next to one relatively strong feature, 
with some overlap. We find the bluer fea- 
ture to be well modelled by one component, 
however the higher wavelength feature appears 
to require two closely spaced components to 
achieve a statistically acceptable fit. Hence, we 
use three components in total to model the 
observed profile. The transitions we use are: 
Sin ^1526A, Mgi ^2852A and Fen AAAAAA 
2382, 2600, 2344, 2587, 1608, 2374A. 

We note that the parameters corresponding 
to the two highest wavelength components are 
manifestly not normally distributed (see Fig.Q] 
for an example). We confirm, by inspection of 
the chain, that this effect is not due to per- 
mutations of corresponding parameters, which 
would leave x 1 unchanged. 

Despite the gross departures from 
Gaussianity for the column density and 
Doppler parameters corresponding to the two 
reddest components, the histogram of Aa/a 
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remains approximately Gaussian (see Fig.[2j». 
The Voigt profile decomposition is not unique, 
and so we can only try to find the model which 
best describes the observations statistically. 
However, for a given model we would naively 
expect that there is a unique value of Aa/a 
which minimises x 1 , and additionally that 
Aa/a should be approximately Gaussian. We 
find both of these statements to be true here. 

It is reassuring that we find concordance 
between the VPFIT and MCMC results for 
Aa/a given the significant non-Gaussianity in 
some parameters. For non-Gaussian param- 
eters, the parameter estimates produced by 
VPFIT will be the correct maximum likeli- 
hood estimates, however the confidence inter- 
vals will be biased. For our present purposes, 
we are only interested in the confidence limits 
on Aa/a, and here we find an acceptable level 
of agreement. 

4.4. Combination of results 

If we assume that a single value of a under- 
lies these, we can combine the three VPFIT re- 
sults above using a weighted mean to estimate 
Aa/a = (-0.74 ± 0.59) x 10~ 5 , which is sta- 
tistically consistent with no change in a. We 
use the VPFIT results as a more conservative 
estimate. 

5. Discussion & conclusion 

Given suitable knowledge of the observed dis- 
tribution of column densities and Doppler pa- 
rameters, we could implement these distribu- 
tions as priors to our model. However, our 
statistical constraints are generally sufficiently 
good that this should not significantly alter our 
parameter estimates. In any event, we are pri- 
marily interested in Aa/a, for which a flat prior 
is reasonable on ignorance grounds. 

We would like t o apply our algori thm to 
verify the results of iKing et alJ (120081) in re- 
lation to A/i/yi/; however, those fits involve 
greater than a thousand parameters each. In 
this context, our algorithm is hopelessly inad- 
equate. Our running times for the objects de- 
scribed herein varied from a few hours to a few 
days, and have only a few tens of parameters. 



Exploration of more complicated cases must 
wait for advances in computing power. 

Our results demonstrate that VPFIT pro- 
duces reliable parameters estimates and un- 
certainties for relatively simple situations. 
Experience with VPFIT suggests that there 
does not appear to be any indication of failure 
with moderately complicated circumstances, 
and so we would argue that the optimisation al- 
gorithm used by VPFIT is robust . The implica- 
tion of is this it that the results of Murpfi yet alJ 
(2004) are unlikely to be explained by some 
failure of the optimisation algorithm used by 
VPFIT. Thus the detection of a change in a 
must either be real, or due to some other un- 
known issue. 
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